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Abstract. Inside every living cell is the cytoplasm: a fluid mixture of thousands of 
different macromolecules, predominantly proteins. This mixture is where most of the 
biochemistry occurs that enables living cells to function, and it is perhaps the most complex 
liquid on earth. Here we take an inventory of what is actually in this mixture. Recent 
genome-sequencing work has given us for the first time at least some information on all of 
these thousands of components. Having done so we consider two physical phenomena in 
the cytoplasm: diffusion and possible phase separation. Diffusion is slower in the highly 
crowded cytoplasm than in dilute solution. Reasonable estimates of this slowdown can 
be obtained and their consequences explored, for example, monomer-dimer equilibria are 
established approximately twenty times slower than in a dilute solution. Phase separation 
in all except exceptional cells appears not to be a problem, despite the high density and 
so strong protein- protein interactions present. We suggest that this may be partially a 
byproduct of the evolution of other properties, and partially a result of the huge number of 
components present. 
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1. Introduction to the cytoplasm 

Living cells are essentially very complex membranes surrounding equally complex solutions 
of, predominantly, protein molecules. These solutions are arguably the most complex liquids 
we know of. This article will begin with some of the basic questions we can ask about these 
complex liquids, together with some partial answers. Then we will look at two phenomena 
in the cytoplasm that are particularly suited to study by physical scientists: diffusion and 
phase behaviour. Below there is a section on each, and we will end with a brief conclusion. 
In the following section we will look at two aspects of diffusion in the crowded environment 
of the cell. The first is the need to estimate the slow down due to the high density of protein 
present. The second aspect is cytoplasmic diffusion as a process that has been optimised by 
evolution. If say the rate of diffusion is limiting the speed of response of the cell to a change 
in the environment then there is natural selection pressure on the proteins to evolve to diffuse 
faster. Section 01 will discuss how we can understand and even calculate some aspects of the 
phase behaviour of models of the cytoplasm, even in the absence of hard data on even one of 
the millions of interactions that occur in the cytoplasm. In the remainder of this introduction 
we will consider some of the basic questions we can ask about the cytoplasm. 

What is in it? A concentrated solution of macromolecules, predominantly protein, but 
also RNA and in the case of prokaryotes one or a few huge DNA molecules. Proteins 
are heteropolymers, they are linear chains of amino acids that are typically folded up 
into a compact, relatively rigid native state that is in many ways more like a colloid 
than a conventional polymer. Prokaryote cells are much simpler than those of eukaryotes. 
Prokaryotes are (relatively) simple organisms such as bacteria, e.g., E. coli. Their cells 
contain only one compartment that contains the DNA, the proteins, the ribosomes where 
new proteins are made etc. See any molecular biology textbook, for example that of 
Alberts et al. [1]. Eukaryote cells are larger and compartmentalised, in particular the DNA 
is in a membrane-bound compartment called the nucleus, not in the cytoplasm. Some 
eukaryotes are single-celled organisms, e.g., yeast, but all complex multicellular organisms, 
e.g., H. sapiens, are eukaryotes. Eukaryote cells have a complex 'skeleton' of filaments of 
protein [1] and not all of the protein diffuses freely in the cytoplasm [2]. We will not discuss 
this further here but it should be borne in mind that the description of the cytoplasm as a 
liquid mixture may be a better approximation in prokaryotes than in eukaryotes. See [3] for 
a review of the properties of the eukaryote cytoplasm. 

Returning to prokaryotes, an inventory of the protein, RNA and DNA in E. coli is 
given in Table d For their net electrostatic charges see [4]. The bacterium E. coli has 
been extensively studied and much is known about it [5-7]. The macromolecules occupy 
around 30% to 40% of the volume inside the cell. Of course it is well known that at these 
concentrations the interactions between the molecules are both strong and important. 

What does it do? Living organisms consume energy, grow, move etc. The cytoplasm 
is where most of the energy is consumed and most of the functions necessary to grow 
etc are performed. The cytoplasm also computes: it receives and integrates signals from 
the environment and changes the functions performed accordingly. For example, if the 
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volume (nm 3 ) 


no. of types 


no. of molecules 


volume fraction 


Protein 


100 


1000 


10 6 


10% 


tRNA 


100 


10 


10 5 


1% 


Ribosome 


10 4 


1 


10 4 


10% 


DNA 


10 6 


1 


1 


0.1% 



Table 1. The protein, RNA and DNA in the cytosol of E. coli [4-7]. For each class 
of macromolecule, the columns indicate the orders of magnitude of, from left to right: 
the volume of a single molecule of this class, the number of different types of molecule 
in this class, the total number of molecules in the cytoplasm of a cell, and the volume 
fraction occupied by molecules of this class. The cytoplasm has a volume of order lpm 3 . 
Ribosomes are large complexes of protein and RNA. They are the cell's protein factories. 
tRNA molecules are relatively small RNA molecules that hold an amino acid in readiness for 
it to be added to the growing chain of amino acids that is being synthesised at a ribosome. 

environment of E. coli contains the sugar lactose but not glucose then a signal is transmitted 
within the cytoplasm and the synthesis of the enzymes needed to metabolise this sugar is 
switched on [1]. 

How does it compute responses, copy DNA etc? This is a large question, indeed 
essentially all of cell biology is concerned with answering this question. Liquids physicists 
perhaps have most to contribute to processes that either involve transport, such as diffusion, 
or the underlying equilibrium behaviour of the mixture of complex molecules that forms the 
cytoplasm. Thus we will focus on diffusion in section |2] and phase behaviour in section 01 

2. Diffusion in vivo 

As a first example, let us consider diffusion. This is essential to transmit signals across the 
cytoplasm, for reactants to collide and so on. We want to understand diffusion in vivo, i.e., 
in the cytoplasm, and to do so we will compare diffusion in vivo with diffusion in vitro, by 
which we mean diffusion in the typically very dilute solutions that biochemists study. These 
solutions are so dilute that they can be treated as an ideal gas. 

The properties of the cytoplasm have been optimised by almost four billion years of 
evolution. Thus one approach to understanding the cytoplasm is to consider how it can be 
optimised. See for example the work of Bialek [8] for elegant examples of this approach. If 
we consider diffusion-limited reactions between a pair of proteins A and B, then the reaction 
rate is [9] 

Rate = kN A N B /V CY TO, (1) 

for A^4 molecules of protein A, and Ng molecules of protein B uniformly distributed within 
a cytoplasm of volume Vcyto- The reaction constant k m Dr, where D and r are the 
diffusion constant and the linear dimension of the volume within which the reaction occurs, 
respectively [9]. Thus the reaction rate per Na molecule is proportional to kN b/ Vcyto- 
For the sake of argument, let us guess that the total volume of the cytoplasm Vcyto is is 
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determined by the need to maximise reaction rates such as that of equation (fT|). Bacteria are 
under strong natural selection pressure to be able to grow rapidly and if reactions like that of 
equation (JTJ) limit this rate then there will be selection pressure to speed up the reaction. At 
fixed numbers of proteins varying the volume fraction of protein is equivalent to varying 
the volume Vcyto- Thus, we will search for the value of that maximises reaction rates. 
Now, the reaction rate per A molecule depends on in two ways: i) N B /Vcyto ^ <fi, the 
denser the cytoplasm the higher the density of B molecules, and ii) through the reaction 
constant k = D(0)r, which depends on the density-dependent diffusion constant. 

Thus, the reaction rate per A molecule is proportional to D(<f))<f) and if the density of 
the cytoplasm is set by the requirement to maximise the reaction rate of diffusion-limited 
reactions then we expect to find cells with a volume fraction that maximises -D(0)0. 
Clearly, many other things are going on in a cell that need to be optimised other than 
the rate of diffusion but let us persevere with our naive assumption. We do not know 
the density dependence of the self-diffusion constant in the cytoplasm, although Elowitz et 
al. [10] have measured the diffusion constant for a small protein in E. coli. So, we resort to 
the standard, but drastic physicists' approximation of treating proteins as hard spheres. A 
reasonable approximation to the long-time self-diffusion constant of colloidal hard spheres is 
given by [11] 

where Do = kT/6irria is the Stokes-Einstein expression for the diffusion constant for a 
colloidal particle at infinite dilution. kT is the thermal energy, i] is the viscosity of the 
solvent, here a salt solution, and a is the radius of the colloidal particle. Equation (J2J) 
is based on earlier work by Medina-Noyola [12]. Using equation (j2J) for D(<f)), we find 
that the D(0)0 is maximal at = 0.18, rather lower than that found inside cells. Also, 
at this volume fraction the self-diffusion constant is 40% of its value at infinite dilution 
whereas the measurements of Elowitz et al. [10] put the diffusion constant of a protein called 
Green Fluorescent Protein (GFP), in E. coli at approximately 10% of its value in a dilute 
solution. Thus our very naive assumption that the cytoplasm is effectively a hard-sphere 
suspension optimised for the reaction rate between pairs of proteins is not consistent with 
the experimental data. However, note that equation (J2J) predicts that at volume fractions 
= 0.3 and 0.4 the self-diffusion constant is 0.2 and 0.1 times its value at infinite dilution, 
respectively. Thus, given the density of the cytoplasm the speed of diffusion in the cytoplasm, 
at least of some relatively small proteins, is similar to that in a hard-sphere suspension with 
the same volume fraction. In summary, it is possible that the proteins are mostly not very 
sticky and so on average the interactions are not far from simple hard-sphere-like repulsions, 
but the density of the cytoplasm appears to be too high to be the result of selection for the 
maximum collision rate between proteins. 

The dense protein solution that is the in vivo environment will affect not only the rate 
at which a pair of proteins come together but also the rate at which a dimer will break apart 
- we expect the push of the other molecules will make it harder for the proteins of a dimer 
to move away from each other. To study this effect let us consider that when proteins A and 
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Figure 1. A plot of the concentrations of both the monomers of type A and the AB 
dimers, as a function of time. The dashed and solid curves are p A and pab, respectively, 
for the in vivo reaction. The dotted and dot-dashed curves are pa and pab, respectively, 
for the in vitro reaction. The parameter values are as described in the text, and in each 
case the initial concentrations were pa = Pb = 10~ 7 nm -3 , and pab — 0. As their starting 
concentrations were the same the densities of the B proteins were at all times equal to the 
densities of the A proteins and so we have not plotted the densities of the B proteins. 



B collide and react, with a rate given by equation (JI}, they form a dimer that then persists 
for some time before dissociating. Then there will be an an equilibrium between A and B 
monomers and AB dimers, 

k 

A + B ^ AB, (3) 
k b 

with forward and back rate constants k and kb, respectively. We will study the effect of the 
in vivo environment by comparing dimer formation there with dimer formation in a dilute 
solution in vitro. The in vitro situation is taken to be a typical experimental situation where 
the proteins are so dilute that they are an associating ideal gas. See [13, 14] for the theory 
of associating ideal gases and for association in dense liquids. 

Now, let us consider proteins A and B that each exist as 100 copies in a cytoplasm of 
volume Vcyto — l/^ni 3 . Thus the total number densities of A and B, Pa + Pab = Pb + Pab = 
10~ 7 nm~ 3 . p A = N a /Vcyto, Pb = N b /V C yto and p A B = Nab /Vcyto, where N AB is the 
number of AB dimers in the cytoplasm. In order to have comparable amounts of the dimer 
and the free A and B proteins in vivo we set the dissociation constant in the cytoplasm 
K c d = 5 x 10~ 8 nm -3 . Note that the dissociation constant is, by definition, one over the 
equilibrium constant, K d = PaPb/pab- Assuming as above that the interactions can be 
modelled by hard core interactions, and setting the hard-sphere volume fraction = 0.4, 
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where gns is the pair distribution function at contact. Using the Carnahan-Starling [15] 
equation, gnsi'P = 0.4) = 3.70. Then we obtain K v d = 1.85 x 10~ 7 nm~ 3 . We are assuming 
that the dimer consists of a pair of touching hard spheres, as in [14]. Equation (j3J) follows 
directly from the definition of the pair distribution function. The pair distribution function 
is the ratio of the actual probability of finding a pair at a given separation to the probability 
of finding a pair at that separation in the absence of interactions, i.e., in an ideal gas. 
See [13, 14] for the use of pair distribution functions in obtaining the density dependence of 
monomer-dimer equilibria. 

We use k = D r with r = lnm for the rate constant for the forward reaction in vitro 
and k = D(<p = 0.4)r in vivo. Taking T = 298K and rj = 10~ 3 Pa s for water, then using 
the Stokes-Einstein expression the diffusion constant for a protein with diameter 5nm is 
Dq = 87//m 2 s _1 . In the cytoplasm, D(<f) = 0.4) = 8.9;um 2 s _1 . These values are similar to 
those for the protein GFP, whose in vitro and in vivo diffusion constants are 87/im 2 s~ 1 [10,16] 
and 7.7/zm 2 s~ 1 [10], respectively. As = kb/k, then once we have specified both the rate 
constant for the forward reaction and the dissociation constant we have the rate constant 
for the back reaction [9]. Here the rate constants for the back reaction, k^, are 0.45s -1 in 
vivo and 16s -1 in vitro. 

It is straightforward to obtain the concentrations Pa, Pb and pab as functions of time 
in vivo and in vitro, by in each case solving the equations 



after setting the initial conditions. We use the initial condition that the density of AB 
dimers is zero. The solutions to equations (jSJ) and © are plotted in figure [TJ The reaction 
in the cytoplasm takes around 4s to reach an equilibrium of equal numbers of monomers 
and dimers, while the reaction in dilute solution takes around 0.2s to reach an equilibrium 
in which there are two and a half times as many monomers as dimers. Thus although the 
behaviour is qualitatively the same in both cases, quantitatively there is a large difference. 
Of course, we have assumed that the interactions are hard-sphere like, attractions will alter 
the picture. 

The dissociation constant in vitro is a factor of gusi'P = 0-4) = 3.70 larger than in 
vivo, which means that taking the proteins A and B out of the in vivo environment and 
putting them in a dilute solution will significantly reduce the number of dimers formed. The 
factor of 3.70 is for a pair of proteins A and B of sizes comparable to the average size of 
the proteins whose crowding is pushing them together. The dissociation constant in vitro 
will be increased by larger factors if the two species are larger than typical proteins. Thus, 
processes that involve the assembly of large complexes should be especially strongly affected 
by removal from the in vivo environment. The copying of DNA is one such process, it is 
done by the cooperative action of a complex of a number of proteins that bind to the double 



dp a 



- kp A pB + hpAB a = A,B 



(5) 



dt 

dpAB 
dt 



kpAPB ~ hpAB- 



(6) 
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helix of DNA. Studies of the copying of DNA [17, 18] found that it was impossible to initiate 
the copying process in vitro unless a concentrated solution of the water-soluble polymer 
polyethylene glycol was added to compensate for the lack of the concentrated solution of 
other proteins found in vivo [17]. This prompted Kornberg to make compensating in in vitro 
systems for the effect of taking the system being studied out of its in vivo home one of his 
ten commandments of biochemistry. Minton has written or cowritten a number of reviews 
on the effect of the crowded environment in vivo [19,20]. 

3. Phase separation in the cytoplasm 

The cytoplasm of all but exceptional cells seems to be highly stable with respect to phase 
transitions such as demixing or crystallisation. Bacteria such as E. coli can survive rather 
large changes in the physical properties of their environment, such as its osmotic pressure [21], 
without the cytoplasm becoming thermodynamically unstable. Phase-transition phenomena 
have been observed in the cells that make up the lens of the eye [22] but these cells are 
exceptional. They are inert and the cytoplasm is predominantly composed of families of 
proteins called the a, j3 and 7-crystallins [23]. This is very different from the composition of 
Table ^ Phase transitions in lens cells have been studied extensively as they are implicated 
in the formation of cataracts. 

We do not know why phase separation occurs in exceptional cells such as those in the 
lens of the eye but does not seem to occur in prokaryote cells or 'normal' human cells. 
However, here we will briefly consider a speculative explanation for the lack of attractive 
protein-protein interactions that could cause separation into protein-rich and protein-poor 
phases, and also a possible explanation for the lack of demixing into phases with different 
protein compositions. Note also that the selection pressure acts on proteins in their natural 
habitat: the cytoplasm, but affects their in vitro properties. For example, Doye et al. [24] 
have speculated that proteins may be under significant selection pressure not to crystallise 
and that this may contribute to the difficulty protein crystallographers have in crystallising 
proteins. 

Our first speculation is that the stability of the cytoplasm is a byproduct of selection 
for another property. This property may be diffusion in the cytoplasm. We have already 
considered diffusion and the rates of diffusion-limited reactions and found that at least 
for some small proteins the measured diffusion [10] is consistent with hard-sphere-like 
interactions. Clearly, if many protein-protein interactions have been selected to be hard- 
sphere-like in order to speed the diffusion of the protein molecules then this will as 
a byproduct select against separation into protein-rich and protein-poor phases as the 
attractions required for this form of phase transition will be selected against. 

The second speculation is that a demixing phase transition, i.e., phase separation into 
two phases with similar total protein concentrations but different compositions, is suppressed 
by the central-limit theorem of statistics. Phase transitions are driven by interactions and 
so to understand how this might come about we need to consider the effect of interactions 
on the thermodynamic functions of the mixture. At the simplest level the interactions affect 
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these functions via the second virial coefficients. At the second-virial-coefficient level the 
excess chemical potential of component % of an N component mixture is 

N 

fi Xi = '2^2B ij p j , (7) 

where B^ is the second virial coefficient for the interaction between components i and j, 
and pj is the number density of component j. 

In earlier work Cuesta and the author assumed that the second virial coefficients 
were independent random variables [25]. Their arguments for taking this apparently rather 
radical step were as follows. Firstly, the cytoplasm may contain thousands of proteins and 
hence even at the level of the second-virial coefficients requires millions of coefficients to 
describe the interactions. We know none of these virial coefficients and so have no choice 
but to guess them. Secondly, work by nuclear physicists on the spectra of complex nuclei 
has shown that by guessing the elements of the Hamiltonian matrix of the nuclei, some 
experimental observations can be reproduced, see for example [26]. Inspired by this work 
Sear and Cuesta replaced the matrix of second virial coefficients by a random matrix. 

Having assumed not only that the B^ are random variables but that they are 
independent, we can easily obtain the probability distribution function of the excess chemical 
potential \ix- We denote the mean and standard deviation of the B^ by b and a, respectively. 
Then the central limit theorem tells us [27] that in the large N limit, the probability 
distribution function for the excess chemical potential of a component in the mixture is 
the Gaussian 

V M = - \ 1/2 exp [- (fi X - JI X ) 2 j (2<4)] (8) 

with mean ]i x = 2bJ2i Pj — 2&Pt, where px is the total density, and a standard deviation 
given by a\ = 4a 2 £f p). Note that once the second virial coefficients are assumed to be 
random variables, the excess chemical potential is also a random variable, and so a probability 
distribution function is the appropriate description. As the number of components N 
increases at fixed total density pr, the individual densities must scale as 1/N, and so 
the variance a\ will also scale as 1/N - - it tends to zero. Thus, as the width of the 
probability distribution for the excess chemical potentials is tending to zero as iV — > oo, the 
excess chemical potentials of all components tend to the same value. Then, the effect of the 
interactions on the chemical potentials of all components are the same and so the mixture 
behaves as a single component system in so far as the interactions are concerned. This of 
course rules out demixing into phases that have the same total concentration of protein but 
different compositions. 

In addition, if indeed the B^ can be modelled by random variables and the correlations 
between them are weak it suggests that the differences between the excess chemical potentials 
of a given protein in the cytoplasms of different prokaryotes, for example in E. coli and in 
M. tuberculosis, may be small. Although the proteins in the different species may differ in 
many ways, the sum of the effects of these differences is small as the individual effects tend 
to average out. 
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In the previous section, we noted that the volume fraction of macromolecules in the 
cytoplasm was around 4> — 0.4, which is too high for a virial expansion truncated after the 
second-virial coefficient terms to be a good description of the free energy. However, our 
result that the excess chemical potentials of the components should become increasingly 
similar as N increases is not restricted to the second- virial-coefficient approximation. Let us 
consider the situation where the excess chemical potential of the ith component is given not 
by equation (|7J) but by the more general expression 



where the Xij an d Xijk control the component-dependent contributions to the excess chemical 
potential and are independent random variables like the B^, and /(</>) is some function of 
volume fraction which accounts for component-independent contributions such as excluded 
volume. For large N, equation (JSJ) gives a probability-distribution function for the excess 
chemical potential with the same form as equation (JHJ), and with a standard deviation whose 
largest term for large N also scales as 1/iV 1 / 2 . Thus, a whole class of mixtures in which the 
interactions between components i and j can be modelled by independent random variables, 
behave as single- component mixtures in the large N limit. This finding is in no way restricted 
to the small volume fractions at which the second- virial-coefficient approximation is accurate. 

The above arguments against demixing are only indicative, a careful analysis of 
demixing in mixtures with virial coefficients that are independent random variables is in [25]. 
The arguments rely on the assumption that the are independent random variables. 
Correlations between a specific property of a protein, namely its size, and its interactions 
were considered by Braun et al. [28] using the theory of polydisperse mixtures (reviewed 
in [29]). They considered, as a model of the mixture of proteins inside cells, a mixture of 
spherical particles with a 'stickiness' between proteins with li and lj amino acids that scales 
as l^ 3 + l^ 3 , i.e., the contribution of the stickiness to the second virial coefficient scaled with 
the sum of the surface areas of the interacting proteins. This is reasonable if the surfaces 
of proteins differ weakly from one protein to another, and the attractive interaction when 
these surfaces approach each other is indeed a 'sticky' interaction, i.e., has a range that is 
small in comparison to the diameter of the proteins. Braun et al. used both genome data for 
the numbers of amino acids in all the proteins for a number of organisms, and experimental 
proteomics data for a bacterium that infects salmon, and found that with this model the 
width of the distribution in virial coefficients due to the distribution of protein lengths was 
far too small to induce demixing [28] . Proteomics is the study of the complete set of proteins 
of an organism. The finding that, for a simple model, the systematic variation of protein- 
protein interactions with a property of the proteins, here size, has little effect, broadly 
supports the use of uncorrelated random variables to represent the virial coefficients. 



N 



N 




(9) 
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4. Conclusion 

The solutions inside living cells are perhaps the most complex liquids on earth. They contain 
thousands of complex components and they are non-equilibrium systems. This complexity 
is daunting but as we have seen in the previous section, a statistical approach can be used 
to make progress. Such an approach can be used to model the effect of the interactions in 
the crowded cytoplasm on any property, for example the rate of protein unfolding [30]. 

The very high density of macromolecules in the cytoplasm means that virtually all 
processes that occur there will be significantly affected by interactions. Of course most 
of both the experimental and theoretical studies of liquids are aimed at understanding 
the effects of interactions. So liquid-matter scientists are ideally placed to contribute 
to attempts to understand the behaviour in the cytoplasm, in particular to attempts to 
understand the differences between in vitro and in vivo behaviour as these are similar 
to the differences between ideal gases and dense liquids [19,20]. Finally, in addition to 
the inevitable interactions due to the crowded nature of the cytoplasm there are many 
interactions that are essential to the function of the cell. For example, the receptors E. coli 
cells use to detect nutrient molecules in their environment interact with each other so as 
to make their response cooperative, this has been modelled using a Ising-like model near 
a phase transition [31,32]. These receptors are embedded in the cell membrane not in the 
cytoplasm but similar interactions may be employed to produce cooperative phenomena in 
the cytoplasm. 
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